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Abstract 


Me  report  on  the  implementation  and  computational  testing  of 
several  versions  of  a set  covering  algorithm,  based  on  the  family  of 
cutting  planes  from  conditional  bounds  discussed  in  the  companion 
paper  [2] . The  algorithm  uses  a set  of  heuristics  to  find  prime  covers, 
another  set  of  heuristics  to  find  feasible  solutions  to  the  dual  linear 
program  which  are  needed  to  generate  cuts,  and  subgradient  optimization 
to  find  lower  bounds.  It  also  uses  implicit  enumeration  with  some  new 
branching  rules.  Each  of  the  ingredients  was  implemented  and  tested  in 
several  versions.  The  variant  of  the  algorithm  that  emerged  as  best 
was  run  on  55  randomly  generated  test  problems  (20  of  them  from  the 
literature),  with  up  to  200  constraints  and  2000  variables.  The  results 
show  the  algorithm  to  be  more  reliable  and  efficient  than  earlier 
procedures  on  large,  sparse  set  covering  problems. 
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SET  COVERING  ALGORITHMS  USING  CUTTING  PLANES, 


HEURISTICS,  AND  SUBGRADIENT  OPTIMIZATION: 

A COMPUTATIONAL  STUDY 
by 

Egon  Balas  and  Andrew  Ho 

1.  Introduction:  Cutting  Planes  from  Conditional  Bounds. 

In  this  paper  we  report  on  the  implementation  and  computational 
testing  of  an  algorithm,  or  rather  a class  of  algorithms,  based  on  the 
cutting  planes  from  conditional  bounds  introduced  in  [1],  and  also 
using  as  ingredients  several  heuristics  for  finding  feasible  primal 
and  dual  solutions,  subgradient  optimization  of  a Lagrangean  function, 
and  implicit  enumeration  with  some  new  branching  rules. 

The  family  of  cutting  planes  from  conditional  bounds  is  briefly 
described  in  this  section;  a more  detailed  discussion  of  the  properties 
of  these  cuts  is  to  be  found  in  the  companion  paper  [2].  In  section  2 
we  describe  the  general  features  of  the  algorithm  whose  versions  we 
implemented  and  tested.  Sections  3-7  discuss  each  of  the  ingredients 
of  our  procedure,  with  comparisons  of  various  versions  based  on  computational 
testing.  Finally,  section  8 summarizes  our  computational  experience  with 
the  algorithm  as  a whole.  While  the  discussion  focuses  on  a particular 
instance  of  the  algorithms  in  the  class  considered,  namely  the  one  that 
emerged  as  the  most  successful  from  our  computational  experiments,  we 
also  discuss  possible  variations  wherever  it  seems  useful. 

As  one  can  see  from  the  computational  results  presented  in  section  8, 
the  algorithm  discussed  here  is  a reliable  and  efficient  tool  for  solving 
large,  sparse  set  covering  problems  of  the  kind  that  frequently  occur  in 
practice.  With  a time  limit  of  10  minutes  on  the  DEC  20/50,  we  have  solved 
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all  but  one  of  a set  of  50  randomly  generated  set  covering  problems  with 
up  to  200  constraints,  2000  variables,  and  8000  nonzero  matrix  entries  (here 
"solving"  means  finding  an  optimal  solution  and  proving  its  optimality) , 
never  generating  a branch  and  bound  tree  with  more  than  50  nodes.  We 
know  of  no  other  approach  with  comparable  performance.  For  problems 
that  are  too  large  to  be  solved  within  a reasonable  time  limit,  the 
procedure  usually  finds  good  feasible  solutions,  with  a bound  on  the 
distance  from  the  optimum  (for  the  one  unsolved  problem,  this  bound 
was  2.37.).  We  also  tested  the  algorithm  on  a set  of  57.  density  problems, 
but  as  density  increases,  the  performance  of  the  algorithm  tends  to  decline. 

We  consider  the  set  covering  problem 

(SC)  min  [cx|Ax>e,  xe{0,l}n} 


where  A = (a^ 
Let  a*  and  a^ 
M * (l, • . • j®} * 


) is  an  m X n 0-1  matrix,  and  e = (1,..,,1)  has  m components, 
denote  the  i-th  row  and  the  j-th  column  of  A,  and  let 
N * {l,...,n}.  We  denote 


= {ieMla^-1},  jeN;  { j cN ^ j = l } , isM. 


We  will  also  use  the  pair  of  dual  linear  programs 


(L)  min  {cx(Ax>e,  xX>} 

and 

(D)  max  (ue|uA<c,  u>0] , 

associated  with  (SC). 

A vector  xc{0,l]n  satisfying  Ax>e  is  called  a cover,  and  S(x)  - {jcN(xj“l] 
its  support.  A cover  whose  support  is  minimal,  is  prime.  For  a cover  x, 
we  denote  T(x)  * {i*M|a^x«l}. 

The  theory  underlying  the  family  of  cutting  planes  from  conditional 
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bounds  can  be  sunmarized  as  follows  (for  proofs  of  these  statements 
and  further  elaboration  see  the  companion  paper  [2]). 


Let  Zy  be  some  known  upper  bound  on  the  value  of  (SC),  and  let  u be 


any  feasible  solution  to  (D) , with  s=c-uA,  such  that  the  condition 


(1) 


is  satisfied  for  some  SQJ,  S = {j(l) j ( p) } - Further,  let  Q^,  i-l,...,p, 

be  any  collection  of  subsets  of  N satisfying 


(2) 


i|l«QtSj<1>  £5)  ' ^ 


Then  every  cover  x such  that  cx<Zy  satisfies  the  disjunction 


(3) 


v (x  »0,  j cQ. ) . 
1=1  J 


Further,  for  any  choice  of  indices  h(i)eM,  i-l,...,p,  the  disjunction 
(3)  implies  the  Inequality 


(4) 


I x > 1 
jeW  1 


where 


Finally,  if  jtOeQ^,  1*1,..., p,  and  if  x is  a cover  such  that 


SCS(x)  and  lui)  cT(3f>nM^i^ , 1*1,..., p,  then  the  inequality  (4)  cuts  off  x 


and  defines  a facet  of 


P*  " {x*Rn|Ax>e,  E x >1,  x>0,  x Integer,  JcN}. 
JCW  J J 


Using  the  above  results,  one  esn  generate  a sequence  of  cutting  planes 
that  are  all  distinct  from  each  other,  by  generating  a sequence  of  covers 
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x and  feasible  solutions  u to  (D) . The  covers  x provide  upper  bounds, 
while  the  vectors  u provide  lower  bounds  on  the  value  of  (SC).  Since 
every  Inequality  generated  cuts  off  a cover  satisfying  all  previously 
generated  Inequalities,  and  the  number  of  distinct  covers  Is  finite, 
the  procedure  ends  In  a finite  number  of  Iterations,  with  an  optimal 
cover  at  hand. 

2 . Outline  of  the  Algorithm. 

The  algorithm  alternates  between  two  sets  of  heuristics,  one  of 
which  finds  a "good"  prime  cover  x for  the  current  problem  and  a 
(possibly  improved)  upper  bound,  while  the  other  generates  a feasible 
solution  to  (D)  satisfying  (1)  for  S*S(x),  and  from  it  a cutting  plane 
that  cuts  off  x,  as  well  as  a (possibly  improved)  lower  bound.  Whenever 
a disjunction  (3)  is  obtained  with  p-1,  all  the  variables  indexed  by 
are  set  to  0.  The  second  set  of  heuristics  is  periodically  supplemented 
by  subgradient  optimization  to  obtain  sharper  lower  bounds.  Though  this 
procedure  in  itself  must  find  an  optimal  cover  in  a finite  number  of 
iterations,  for  large  problems  this  may  take  too  many  cuts.  Therefore, 
aa  soon  as  the  rate  of  improvement  in  the  bounds  decreases  beyond  a 
certain  value,  the  algorithm  branches. 

A schematic  flowchart  of  the  algorithm  is  given  in  figure  1.  PRIMAL 
designates  the  set  of  heuristics  used  for  finding  prime  covers  (feasible 
primal  solutions),  DUAL  the  heuristics  used  for  finding  feasible  dual 
solutions.  TEST  is  the  routine  for  fixing  variables  at  0.  CUT  generates 
a cutting  plane  violated  by  the  current  cover.  SCRAD  uses  subgradient 
optimization  in  an  attempt  to  find  an  improved  dual  solution  and  lower 
bound.  BRANCH  is  the  branching  routine,  which  breaks  up  the  current 


problem  into  a number  of  subproblems,  while  SELECT  chooses  a new 
subproblem  to  be  processed. 

The  four  decision  boxes  of  the  flowchart  can  be  described  as  follows. 
Let  Zy  and  Zy  be  the  current  upper  and  lower  bound,  respectively,  on  the 
value  of  (SC). 

1.  If  Zy  > Zy,  the  current  subproblem  is  fathomed  (1.1).  If 
Zy  < Zy  and  some  variable  belonging  to  the  last  prime  cover  has  been 
fixed  at  0,  a new  cover  needs  to  be  found  (1.2).  Otherwise,  a cut  is 
generated  (1.3). 

2.  After  adding  a cut,  the  algorithm  returns  to  PRIMAL  (2.1)  unless 

the  iteration  counter  is  a multiple  of  ai  in  which  case  (2.2)  it  uses 

subgradient  optimization  in  an  attempt  to  improve  upon  zT . On  the 

Li 

basis  of  some  experimentation,  we  set  | M | / 10  < a < | M | / 20 . 

3.  If  z > z , the  current  subproblem  is  fathomed  (3.1).  If 
zL  < Zy  but  the  gap  zy  - Zy  has  decreased  by  at  least  e>0  during  the 
last  0 iterations,  we  continue  the  iterative  process  (3.2).  Otherwise, 
we  branch  (3,3).  Again,  following  some  numerical  experiments,  we  use 

C * 0.5  and  0 * 4a,  with  a as  defined  in  2. 

4.  If  there  are  no  active  subproblems,  the  algorithm  stops:  the 
cover  associated  with  zy  is  optimal  (4.1).  Otherwise,  it  applies  the 
iterative  procedure  to  die  selected  subproblem  (4.2). 

In  the  next  five  sections  we  discuss  each  of  the  ingredients  of  the 
algorithm  in  some  detail  on  the  basis  of  computational  testing  of  several 
versions.  After  that  we  report  on  our  computational  experience  with  the 
algorithm  as  a whole. 


3.  Primal  Heuristics. 


The  heuristics  we  use  to  generate  prime  covers  are  of  the  "greedy" 
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type,  in  that  they  construct  a cover  by  a sequence  of  steps,  each  of 

which  consists  of  the  selection  of  a variable  x.  that  minimizes  a 

J 

certain  function  f of  the  coefficients  of  x^ . They  differ  in  the 
function  f used  to  evaluate  the  variables.  If  denotes  the  number 
of  positive  coefficients  of  x^  in  those  rows  of  the  current  constraint 
set  not  yet  covered,  the  general  form  of  the  evaluation  function  is 

fCCj.kj). 

Since  it  is  computationally  cheaper  to  consider  only  a subset  of 
variables  at  a time  and  since  every  row  must  be  covered  anyhow,  i.e.,  the 
cover  to  be  constructed  must  contain  at  least  one  of  the  variables 
having  positive  coefficient  in  any  given  row,  we  restrict  the  choice 
at  each  step  to  the  set  of  variables  having  a positive  coefficient  in 
some  specified  row  i*eM.  Denoting  by  R the  set  of  rows  not  yet  covered 
and  by  S the  support  of  the  cover  to  be  constructed,  the  basic  procedure 
that  we  use  can  be  stated  as  follows. 

Step  0.  Set  R = M,  S = 0,  t = 1,  and  go  to  1. 

Step  1.  If  R = 0,  go  to  2. 

Otherwise,  let  | flR | , choose  i*eR,  and  choose  j(t)  such  that 

'"jctr  kKt))  * ^ f<vV- 

Set  R *-  S «-  Stj{j(t)},  t — 1+1,  and  go  to  1. 

Step  2.  Consider  the  elements  igS  in  order,  and  if  S\(i}  is  the 
support  of  a cover,  set  S *-  S\[i},  When  all  igS  have  been  considered, 

S defines  a prime  cover. 

As  to  the  choice  of  i^  in  Step  1,  the  criterion  that  suggests  itself  is 


|N  | = min  |N  | . 
* ieR  1 
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Rather  than  implement  this  choice  rule  directly,  which  would  be 

costly,  we  approximate  it  by  ordering  once  and  for  all  the  rows  of  the 

initial  coefficient  matrix  A according  to  decreasing  |N  |,  and  always 

choosing  i^  as  the  last  element  of  the  ordered  set  R.  Since  the  cuts 

generated  in  the  procedure  also  tend  to  have  a decreasing  number  of  l's, 

i.e.,  later  cuts  tend  to  have  fewer  l's  than  earlier  cuts,  this  rule 

comes  sufficiently  close  to  choosing  the  row  with  the  minimum  number  of  l's. 

If  the  set  N in  step  1 is  replaced  by  N,  i.e.,  the  choice  is  not 

restricted  to  a certain  row,  and  step  2 is  removed,  i.e.,  the  procedure 

is  allowed  to  stop  whenever  a cover  is  obtained,  whether  prime  or  not, 

then  the  above  procedure  with  f(Cj,10  = c^/k^  is  t^e  greedy  heuristic 

shown  by  Chv^tal  [31  to  have  the  following  property:  if  z is  the  value 

opt 

of  (SC)  and  z^eu  is  the  value  of  a solution  found  by  the  heuristic,  then 

z,  / z < Z T 

heu  opt  - ^ j 

where 

d = max  | M . | , 
jeN  J 

and  this  bound  is  best  possible.  From  a practical  point  of  view  this 
bound  is  very  poor,  and  it  can  be  shown  [7]  that  there  is  no  better  bound 

for  any  function  f used  in  the  above  procedure.  However,  proving  this 

statement  requires  the  construction  of  examples  for  which  the  worst  case 
bound  is  attained,  and  every  function  f requires  a different  example.  This 
suggests  as  a practical  remedy  against  the  poor  worst-case  performance  of 
the  heuristic,  the  intermittent  use  of  several  functions  5 rather  than  a 
single  one.  This  idea  has  been  implemented  and  tested  with  reasonably 
good  results.  The  following  five  functions  f(Cj,kj)  were  considered: 
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(1)  c ; (ii)  c /k  ; (iii)  c /log9k  ; (iv)  c /k.log  k.; 

J J J j J J J 2 j 

(v)  c^/k^ln  k^  . In  cases  (iii)  and  (iv) , lc^k^  is  to  be  replaced 
by  1 when  k^  = 1;  and  in  case  (v)  In  k^  is  to  be  replaced  by  1 when 
k.  = 1 or  2. 

j 

Using  (i)  amounts  to  simply  choosing  the  lowest-cost  variable  at 
each  step.  Criterion  (ii)  minimizes  the  unit  cost  of  covering  an  uncovered 
row.  The  functions  (iii),  (iv)  and  (v)  select  the  same  variable  as 
function  (ii)  whenever  c^  = 1,  VjeN,  but  otherwise  (iii)  assigns  less 
weight,  while  (v)  assigns  more  and  (iv)  even  more  weight  to  the  number 

k.  of  rows  covered,  versus  the  cost  c.. 

J J 

The  five  functions  were  tested  on  a set  of  13  randomly  generated 
problems  with  100  < m < 200,  100  < n < 1000,  and  2%  density.  Though  none 
of  them  emerged  as  uniformly  dominating  any  of  the  others  in  terms  of 
the  quality  of  the  solutions  obtained,  (iii)  scored  best  and  (ii)  second 
best,  in  the  sense  that  of  the  13  problems,  (iii)  gave  the  best  solution 
in  6 cases,  (ii)  in  5 cases.  As  to  the  other  functions,  the  best  solution 
was  found  by  (i)  and  (v)  in  3 cases  each,  and  by  (iv)  in  2 cases  (the  sum 
of  these  numbers  exceeds  13,  since  often  more  than  one  function  gave  a 
best  solution).  Table  1 shows  the  % deviation  from  the  optimum,  of  the 
solution  found  by  each  function  for  each  problem.  The  numbers  in  the 
first  column  are  those  under  which  the  problems  are  listed  in  section  8, 
where  they  are  also  described  in  more  detail.  The  best  solution  found 
by  any  of  the  5 functions  never  deviated  from  the  optimum  by  more  than  10.8%. 

The  above  described  procedure  can  be  amended  so  as  to  produce,  at 
little  extra  cost,  more  than  one  cover,  the  best  of  which  can  then  be 
retained.  This  is  done  as  follows.  We  first  use  the  above  heuristic  to 
find  a cover.  Then  we  consider  the  variables  in  the  order  of  their 
inclusion  into  the  cover,  and  remove  from  the  cover  all  those  that  have  a 
positive  coefficient  in  at  least  one  oversatisfied  constraint.  Next  we 
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Table  1,  Deviation  from  optimum  (In  7°)  of  the  values  found  by  primal  heuristics 


Problem  data 


Function  used  with  heuristic 

c . 

J 

c /k . 

J J 

c /log  k 

J L J 

W°82kJ 

Cj /kj lnkj 

complete  the  cover  again  using  the  heuristic.  We  continue  in  this  fashion 
until  either  a cover  is  generated  which  does  not  oversatisfy  any  of 
the  constraints,  or  the  number  of  covers  generated  is  some  specified 
integer  a.  To  find  the  most  desirable  value  for  a,  we  have  applied 
this  procedure  with  o=10  to  each  of  the  above  described  13  problems, 
with  each  of  the  5 functions  discussed.  Somewhat  surprisingly,  we 
found  that  of  the  13  X 5 = 65  cases,  the  best  of  the  10  covers  generated 
was  the  first  one  in  22  instances,  the  second  one  in  34  instances,  the 
third  one  in  8 instances,  and  in  1 instance  it  was  the  fifth  one.  In 
other  words,  only  once  was  there  an  improvement  after  the  third  cover 
was  found,  and  this  in  spite  of  the  fact  that  the  best  cover  found  was 
not  optimal  in  any  of  tne  65  cases. 

Consequently,  we  set  0*3,  and  use  function  (iii)  for  the  first  cover, 
a different  function  ((i)  or  (ii))  for  the  second  cover,  and  again  a 
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different  function  for  the  third  cover.  This  way  the  procedure 
is  still  computationally  cheap,  and  yields  considerably  better  results 
than  the  version  that  produces  only  one  cover.  We  call  this  procedure 
PRIMAL  1. 

The  general  algorithm  discussed  in  section  1 at  times  fixes 
variables  at  0.  Whenever  some  variable  belonging  to  the  current  cover 
gets  fixed  at  0,  we  have  to  generate  a new  cover.  Rather  than  starting 
from  scratch,  in  such  situations  we  start  with  the  partial  cover  at  hand, 
and  complete  the  cover  by  using  the  procedi  re  discussed  above.  This 
version  of  the  heuristic  we  call  PRIMAL  2. 

When  the  dual  heuristics  to  be  discussed  in  the  next  section  produce 
a vector  (u,s)  such  that  (1)  does  not  nold  for  S=S(x),  where  x is  the 
current  cover,  either  the  dual  solution  u must  be  weakened  (see  the  next 
section),  or  else  the  cover  x must  be  replaced  by  another  one,  say  x, 
such  that  (1)  is  satisfied  for  S=S(x).  PRIMAL  3 was  designed  to 


accomplish  this  starting  with  the  cover  at  hand.  It  introduces  into 


the  cover  additional  variables  x^  such  that  s^X),  in  order  of  increasing 
values  of  f (as  defined  in  (iii)),  and  removes  variables  in  order  of 
increasing  values  of  s^  so  as  to  keep  the  cover  prime.  While  it  is  not 
guaranteed  to  succeed,  the  percentage  of  failure  is  sufficiently  low  to 
justify  the  use  of  PRIMAL  3.  When  it  fails,  the  dual  solution  u must  be 
weakened  as  explained  in  the  next  section. 

Whenever  a new  cut  is  generated,  the  last  cover  satisfies  all  the 


constraints  except  for  the  newly  added  one.  Since  it  is  much  cheaper  to 
obtain  a new  cover  from  the  old  one  than  to  construct  a new  one  from 
scratch,  a special  heuristic  was  Implemented  for  this  purpose.  PRIMAL  4 
adds  to  the  current  cover  a number  a of  variables  with  positive  coefficients 
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in  the  cut  just  generated,  chosen  in  order  of  increasing  c , and  then 
removes  from  the  cover  redundant  variables  so  as  to  make  it  prime. 

Computational  experiments  with  cr=l 5 have  shown  0=2  to  give  the 

best  results. 

Finally,  every  time  we  apply  the  subgradient  method  to  obtain  an 
improved  lower  bound,  we  also  generate  a new  cover  by  using  the  reduced 
costs  Sj  produced  by  that  procedure.  This  is  done  by  subroutine  PRIMAL  5, 
by  setting  x.  = 1 if  s.  = 0,  x.  = 0 otherwise.  The  resulting  vector  either 
is  a cover,  or  else  if  row  i is  uncovered,  then  s^  > 0,  VjeN^,  and  variable  u^ 
can  be  increased  to  u + min  s . This  creates  at  least  one  new  reduced  cost  s 

1 j*N  J 

equal  to  0,  and  for  each  such  k we  set  x^  = 1.  We  proceed  this  way 
until  every  row  is  covered,  after  which  we  make  sure  the  cover  is  prime,  like 
in  Step  2 of  PRIMAL  1.  PRIMAL  5 produces  consistently  better  covers 
(hence  better  upper  bounds)  than  any  of  the  other  4 procedures;  but 
obtaining  the  reduced  costs  by  subgradient  optimization  is  many  times  more 


expensive  than  producing  them  by  the  dual  heuristics,  as  will  be  discussed  below. 

While  the  conditions  for  using  PRIMAL  2,  3 and  5 have  been  spelled 
out,  the  choice  between  PRIMAL  1 and  4 seems  open  at  this  point.  PRIMAL  4 
is  computationally  cheaper,  but  it  often  produces  a cover  that  differs 
very  little  from  the  previous  one.  PRIMAL  1 is  more  expensive,  but 
yields  a genuinely  new  cover.  We  follow  the  strategy  of  using  PRIMAL  1 
to  obtain  the  first  cover,  then  using  PRIMAL  4,  except  for  every  9-th 
iteration,  when  we  again  use  PRIMAL  1.  As  to  the  value  of  9,  computational 
experiments  have  led  us  to  start  with  9=1  and  then  set  9 - min  {9+1,7}. 

In  other  words,  at  the  beginning  we  return  to  PRIMAL  1 more  frequently, 
then  at  regular  intervals  of,  scy,  7 iterations. 

3.  Dual  Heuristics. 

The  purpose  of  the  dual  heuristics  is  to  find,  at  a low  computational 


-13- 


cost,  a feasible  solution  to  the  dual  linear  program  (D)  associated  with 
the  current  problem,  with  as  high  an  objective  function  value  (lower  bound 
on  the  value  of  (D) , hence  of  (SC)),  as  possible.  In  addition,  the  dual 
solution  u and  its  associated  reduced  cost  vector  s=c-uA  have  to  satisfy 
condition  (1)  for  S=S(x),  where  x is  the  current  cover.  Again,  we  use 
a procedure  of  the  "greedy"  type,  which  considers  the  variables  in  some 
prescribed  order  and  assigns  to  each  one  the  maximum  value  that  can  be 
assigned  without  violating  some  constraint  or  changing  some  earlier  value 
assignment.  Since  it  is  known  (see  [2] , Theorem  4)  that  for  a feasible  solution 
u to  (D),  the  vector  s = c - uA  satisfies  (1)  for  S=S(x)  if  u satisfies 
u(Ax-c)=0,  in  considering  the  variables  u^  priority  is  given  to  ieT(x), 
where  T (x)  = (ieMja^x  = l}.  Denoting  by  R the  index  set  of  the  dual 
variables  (rows)  not  yet  assigned  a value  (ordered  in  accordance  with  M) , 
the  basic  procedure  is  as  follows. 

Step  0.  Set  R = Mf)T(x),  s=c,  t=l , and  go  to  1. 

Step  1.  If  R = 0,  go  to  2.  Otherwise,  let  ICR,  choose  i(t) 
such  that 


= min  |N  I , 
iel 


and  let 


"HH  ‘ "i" 


Then  set 


. . ; ' ui(t>  ^i(t) 

1 sj  J«N\N1(c), 

R - R\{i(t)},  t •-  t + 1,  and  go  to  1. 

Step  2.  If  Step  2 is  entered  for  the  first  time,  set  R - M\T(x)  and 

go  to  1.  Otherwise,  stop:  u is  a feasible  solution  to  (D). 

Restricting  the  choice  of  i(t)  to  a subset  I of  R has  the  sole 
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purpose  of  making  this  choice  computationally  less  costly,  at  the 
risk  of  sacrificing  some  quality.  Since  the  rows  of  the  original  matrix 
A are  ordered  according  to  decreasing  |N^(  and  the  cuts  generated  also 
tend  to  have  progressively  fewer  l's,  we  define  I as  the  union  of  (1) 
the  last  element  of  where  is  the  row  Index  set  of  the  original  constraint 

matrix,  and  (li)  the  last  X.  elements  of  R,  where  X * min  {|Rl,  V"|mo0* 

This  makes  sure  that  the  choice  rarely  --  if  at  all  --  misses  the  minimum 
of  (NjJ  over  all  i*R.  This  procedure  we  call  DUAL  1. 

A feasible  solution  to  the  current  (D)  remains  feasible  after  adding 
a new  cut  to  (SC),  i.e.,  a new  column  to  the  constraint  set  of  (D) , but 
usually  ceases  to  be  feasible  if  the  new  variable  is  assigned  a positive 
value.  On  the  other  hand,  if  the  new  variable  is  set  to  0,  the  solution 
remains  unchanged.  Furthermore,  if  a new  solution  is  generated  from 
scratch  by  the  same  heuristic,  it  is  often  identical  to  the  previous  one. 

DUAL  2 is  a version  of  the  heuristic  that  starts  with  the  infeasible 
solution  obtained  from  the  last  feasible  solution  u by  assigning  a value 
of  1 to  the  new  variable  associated  with  the  last  cut.  This  guarantees 
that  the  dual  solution  to  be  obtained  will  differ  from  all  the  previous 
ones.  Next  the  remaining  variables  are  considered  in  a certain  order,  and 
set  to  0 (or,  in  the  case  of  the  last  variable,  to  the  maximum  allowable 
value),  until  the  solution  becomes  feasible.  The  order  in  which  the 
variables  u^  are  considered  is  that  of  decreasing  number  of  positive 
coefficients  in  constraints  of  (D)  that  are  violated,  with  priority  given 
to  variables  u^  corresponding  to  primal  constraints  that  are  oversatisfied 
by  the  current  cover. 

Finally,  a vector  (u,s)  generated  by  DUAL  1 or  DUAL  2 may  violate 
condition  (1)  for  S-S(x),  where  x is  the  current  cover.  In  such  cases  the 


* 


-15- 


i 

i 


' 


algorithm  goes  to  PRIMAL  3,  in  an  attempt  to  find  a new  cover  x such 
that  (1)  holds  for  S«S(x).  Though  our  computational  experience  has  been 
that  PRIMAL  3 seldom  fails  to  find  such  a cover,  failure  does  sometimes 
occur.  In  such  cases  we  use  DUAL  3 to  modify  the  solution  u,  while 
weakening  it  as  little  as  possible,  so  as  to  satisfy  (1).  DUAL  3 
considers  the  variables  u^ , ieM\T(x),  in  decreasing  order  of  |N^0S(x)|,  and 
successively  reduces  their  value  until  (1)  is  satisfied  for  S-S(x).  Since 
this  latter  condition  always  holds  when  all  u^,  ierf\T(x),  are  set  to  0, 
the  procedure  always  ends  with  a solution  having  the  required  property. 

While  DUAL  3 is  used  under  clearly  defined  circumstances,  DUAL  1 and 
DUAL  2 can  be  used  intermittently.  DUAL  2 is  computationally 
cheaper  than  DUAL  1 and  guarantees  a new  solution,  but  DUAL  1 Is 
more  likely  to  produce  an  improved  lower  bound.  We  start  with  DUAL  1 
and  then  use  DUAL  2,  except  for  every  u_th  iteration,  when  we  again  use 
DUAL  1.  Based  on  some  computational  experimentation,  we  set  ^*4 . 

4.  Subgradlent  Optimization. 

While  the  dual  heuristics  provide  reasonably  good  feasible  solutions 
to  (D)  at  a low  computational  cost,  a sharper  lower  bound  could  of  course 
be  obtained  by  solving  to  optimality  the  linear  program  (D).  After 
sufficient  cuts  have  been  added  to  the  constraint  set  of  (SC),  the  value 
of  the  current  problem  (D)  may  exceed  z^,  thus  bringing  the  procedure  to 
an  end.  However,  the  computational  effort  involved  in  solving  (D)  by  the 
simplex  method  is  considerable,  and  increases  at  least  quadratically  with 
the  number  of  cuts  added  to  the  constraint  set  of  (SC).  On  the  other  hand, 
one  can  use  subgradient  optimization  to  find  an  optimal  or  near-optimal 
solution  to  (D)  at  a computational  cost  that  seems  to  Increase  only  linearly 
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with  the  number  of  cuts  added.  This  Is  what  we  are  doing  periodically 
in  order  to  generate  lower  bounds  stronger  than  those  obtained  by  the  dual 
heuristics.  Though  the  subgradient  method  consistently  produces  a stronger 
bound  than  the  heuristics,  the  cuts  derived  from  the  dual  solution  obtained 
this  way  tend  to  be  weaker  than  those  derived  from  the  heuristically 
generated  dual  solutions.  This  is  so  because  the  reduced  costs  obtained 
by  the  subgradient  method  do  not  usually  satisfy  (1),  and  the  dual  solution  u 
(together  with  the  bound  ue)  has  to  be  weakened  in  order  to  get  (1)  satisfied. 
The  subgradient  method  that  we  use  is  a specialization  of  the 

general  procedure  discussed,  for  instance,  in  [61  or  [51.  We  wish  to  find 
or  approximate 

max  min  L(x,u)  = cx  + ue  - uAx, 
ueF  xeG 

where  F and  G are  suitable  relaxations  (supersets)  of  the  feasible  sets 
of  (D)  and  (L)  respectively,  i.e., 

F (u(uA  < c,  0 < u^  < U , Vi} 

and 

G 3 (x|Ax  > e,  0 < x <1,  Vj}, 

where  u * min  c , Vi. 

J*Nt  j 

1. 

During  every  subgradient  iteration,  given  the  current  vector  u , we 
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solve  the  problem 


(6) 


min  L(x,u  ) , 
x*G 


k k k 

and  if  x(u  ) denotes  an  optimal  solution  to  (6),  we  put  d = e - Ax(u  ), 

k+1  D , k ,k. 
u * Pp(u  + t^d  ) . 

k k 

Here  the  direction  vector  d is  a subgradient  of  L(x,u)  it  u * u , the 

scaler  t^  is  the  step  length,  while  Pp(g)  is  the  projection  of  the 

vector  g on  F.  The  step  length  is  of  the  form 

(z  - L(x(uk)  ,uk)  ) 

tk 

where  0 < X,  < 2 and  the  double  bars  denote  the  Euclidean  norm;  and  if 

we  take  the  relaxation  of  the  feasible  set  of  (D)  to  be  F * (u|0<u^u}, 

Ic  k 

then  the  projection  of  g»u  +t^d  on  F is 

f*i  i£h>"i 

PF(gt)  * j if  0 < gt  < Ul 

V.0  if  gt  < o. 

We  tried  two  different  relaxations  G of  the  feasible  set  of  (L) . 

The  first  one, 


Gx  =*  {xeRn|0  < x < 1,  jcN}, 


makes  the  solution  of  (6)  trivial:  the  optimum  is  attained  at 

0 if  < cj 

*,(0  J e[0,l]  if  uka 


J 


J J 

if  u"aj  > c^ 


For  jeN  such  that  u a^  " cj»  wtiere  any  xj  c[ 0 , 11  is  optimal,  we  have  tried 
to  set  Xj  - 0,  1/2  and  1,  and  Xj  ■ 1 gave  on  the  average  slightly  better 


I 


results;  so  this  is  what  we  use. 

For  the  second  relaxation,  we  choose  a (maximal)  subset  M c M such 
that  Vi,k«M,  ij*k,  and  define 


( 

l xeRn 


Z x > 1,  ieM 


^ 2 <0 

0 < x < 1,  j«N 


where 


C |M.  I - 1 if  jc  U_N 
i - { J ieM  1 

J ^ | otherwise 


and  dQ  - | M\M | . 

The  idea  of  using  the  inequalities  defined  by  a family  of  disjoint 
subsets  C S,  ieM,  is  borrowed  from  J.  Etcheberry  f4] . The  extra 
inequality  that  we  add,  which  is  the  sum  of  the  remaining  inequalities 
of  Ax  > e,  usually  makes  G2  considerably  more  constrained. 

While  G 2 is  a tighter  relaxation  than  G^,  it  is  also  one  for  which 
solving  (6)  is  considerably  more  expensive.  We  therefore  restrict  ourselves, 
when  using  G2,  to  approximating  the  optimum  of  (6)  by  a fast  heuristic. 

In  this  version,  the  subgradient  procedure  using  G,  is  about  1.2-1. 3 
times  more  time  consuming  than  the  one  using  G^,  but  it  also  tends  to  be 
more  reliable  and  to  occasionally  produce  a slightly  better  bound.  A 
comparison  on  5 randomly  generated  200  x 2000  problems  (with  8000  nonzero 
matrix  entries)  showed  the  version  using  G2  to  generate  0.61  times  the 
number  of  nodes  (of  the  search  tree)  generated  by  the  version  using  G^, 
and  to  require  0.85  times  the  amount  of  total  time  required  by  the  latter. 
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Currently  the  main  version  of  our  algorithm  uses  G^. 

We  have  tested  various  strategies  for  choosing  the  value  of  the 
parameter  in  the  definition  of  the  step  length  t^,  and  ended  up  with 
setting  = 2 at  the  start,  and  then  dividing  the  current  by  1/2  whenever 
there  is  no  improvement  in  the  value  of  the  Lagrangean  for  7 consecutive 
iterations  of  SGRAD. 

To  start  the  subgradient  optimization  procedure,  one  needs  an  initial 
solution  u^.  We  use  for  this  purpose  the  vector  u obtained  from  the  dual 
heuristics  when  we  apply  SGRAD  the  first  time  to  a problem;  then  at  subsequent 
applications  of  SGRAD  we  use  as  u^  the  dual  solution  obtained  in  the  last 
application  of  SGRAD,  which  is  usually  considerably  better  than  the  one 
obtained  from  the  dual  heuristics.  The  quality  of  the  starting  solution 
apparently  makes  a great  difference  in  the  computational  effort  involved  in 
SGRAD:  the  first  application  of  SGRAD  takes  about  6 times  the  computational 

effort  required  by  subsequent  applications  to  the  same  problem  (amended  with 
cuts)  . 

As  to  the  overall  usefulness  of  the  subgradient  method  in  our 
algorithm,  our  experience  has  been  that  though  it  is  computationally  more 
expensive  than  the  dual  heuristics  by  1 and  often  2 orders  of  magnitude, 
subgradient  optimization  nevertheless  pays  off.  On  the  one  hand,  it 
produces  consistently  better  lower  bounds  than  the  heuristics,  by  a 
margin  that  tends  to  increase  with  problem  size;  on  the  other,  it 
provides  a set  of  reduced  costs  that  can  be  used  by  PRIMAL  5 to  generate 
consistently  better  covers,  and  hence  better  upper  bounds,  than  the  other 
primal  heuristics.  These  findings  are  illustrated  in  Table  2.  The 
problems  listed  there  are  all  randomly  generated,  2%  density  set 
covering  problems,  described  in  more  detail  in  section  8. 
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Table  2.  Improvement  in  bounds  (in  %)  due  to  SGRAD  a nd  PRIMAL  5 . 


Problem  data 


No. 

m 

n 

2.2 

200 

413 

2.3 

200 

300 

2.4 

200 

500 

3.1 

100 

100 

3.2 

100 

200 

3.3 

100 

300 

3.4 

100 

400 

3.5 

100 

500 

3.6 

100 

600 

3.7 

100 

700 

3.8 

100 

800 

3.9 

100 

900 

3.10 

100 

1000 

5.1 

200 

2000 

5.2 

200 

2000 

5.3 

200 

2000 

5.4 

200 

2000 

5.5 

200 

2000 

5.6 

200 

2000 

5.7 

200 

2000 

5.8 

200 

2000 

5.9 

200 

2000 

5.10 

200 

2000 

Improvement  (%) 


Lower  bound  Upper  bound 


SGRAD 
versus 
DUAL  1 


8.18 
3.3 
11.2' 


PRIMAL  5 
versus 
PRIMAL  1 


SGRAD 

iterations 
to  obtain 
lower 
bound 


1.21 

1.57 

1.53 

3.69 

41 

23 

4.90 

3.05 

82 

6.07 

9.30 

80 

0.61 

9.08 

44 

7.39 

3.65 

178 

9.35 

3.63 

139 

15.13 

3.15 

164 

2.87 

6.77 

263 

6.58 

2.11 

93 

13.82 

8.90 

93 

15.98 

0 

176 

15.90 

6.22 

96 

14.95 

0.78 

148 

15.90 

6.19 

97 

12.38 

5.31 

99 

19.13 

0.97 

146 

16.25 

7.89 

183 

8.74 

3.37 

123 

12.29 

6.03 

92 

-h+VW*. 
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Table  3.  Average  time  per  application  of  DUAL  1 or  2 and  SGRAD 


DUAL  1 or  2 


SGRAD 


No.  of 

times 

used* 

Total 

time** 

spent 

Average 
time 
for  one 
application 

1,638 

207.4 

0.127 

i 

No.  of 

times 

used* 

Total 

time** 

spent 

Average 
time 
for  one 
application 

181 

623.3 

3.44 

* Total  for  all  23  problems  of  Table  2 
**  DEC  20/50  seconds 


Table  4.  Cutting  plane  algorithm  with  and  without  SGRAD 


With  SGRAD* 


cuts 


Problem  data 

No. 

m 

— i 

n 

1.1  15  32 

30  30 

30  40 

30  50 

30  60 

30  60 


1.10  30  80 


Without  SGRAD 

No.  of 

Time** 

cuts 

3 

0.30 

12 

0.58 

14 

0.77 

28 

1.44 

115 

9.44 

77 

4.75 

>438 

>120.00 

>480 

>120.00 

211 

23.35 

47 

3.40 

>406 

>120.00 

>496 

>120.00 

* SGRAD  applied  at  first  and  every  10-th  iteration 

**  DEC  20/50  seconds 
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Table  3 shows  the  average  time  needed  for  one  application  of 
the  heuristic  DUAL  1 or  2,  and  one  application  of  SGRAD,  with  the 
averages  taken  over  all  applications  to  all  of  the  23  problems  of 
Table  2.  The  comparison  shows  SGRAD  to  be  about  27  times  (3.44:0.127  = 27) 
as  time  consuming  as  DUAL  1 or  2.  The  factor  27  is,  however,  an  average: 
as  mentioned  earlier,  the  first  application  of  SGRAD  to  a problem  is 
about  6 times  as  expensive  as  subsequent  applications  to  the  same 
problem  (with  added  inequalities);  in  particular,  the  first  application 
of  SGRAD  requires  about  100  times  more  time  than  an  application  of 
DUAL  1 or  2,  while  subsequent  applications  to  amended  versions  of  the 
same  problem  require  on  the  average  15  times  as  much  time  as  DUAL  1 or  2. 

Finally,  to  support  our  contention  that  in  spite  of  these  great 
time  differences  the  use  of  SGRAD  pays  off,  we  show  in  Table  4 the 
outcome  of  the  cutting  plane  procedure  with  and  without  SGRAD,  on  a 
set  of  12  set  covering  problems  taken  from  the  literature  and  described 
in  section  8.  These  problems,  all  of  which  except  for  1.1  have  unit 
costs,  were  particularly  hard  for  the  cutting  plane  algorithm  without 
SGRAD,  which  failed  to  solve  4 of  them  within  a time  limit  of  2 minutes. 

6 . Fixing  Variables  and  Generating  Cuts. 

Every  time  a new  solution  u to  (D)  is  obtained  either  by  one  of  the 
heuristics  or  by  subgradient  optimization,  the  algorithm  searches  for 
variables  x such  that  s^  > zy  - ue,  and  fixes  them  at  0,  removing  the 
corresponding  indices  from  N. 

Intuitively,  one  would  be  inclined  to  think  that  this  feature  of 
the  algorithm  becomes  operative  only  after  many  iterations,  when  the 
g*P  zu-zl  *ias  keen  narrowed  down  considerably.  This,  however,  was  not  the 
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case  on  the  randomly  generated  problems  that  we  solved.  Substantial 
numbers  of  variables  were  usually  fixed  at  0 quite  ea-l.y  in  the  procedure, 

and  by  the  time  the  first  branching  occurred,  the  number  of  variables  left 

was  almost  always  close  to  the  number  m of  initial  constraints,  as  can  be 
seen  from  the  data  of  section  8. 

To  generate  a cut,  the  algorithm  uses  a subroutine  that  implements 
the  procedure  discussed  in  [2].  Let  x be  the  current  cover,  S(x)  and  T(x) 
defined  as  above,  and  the  current  upper  bound. 

Step  0.  Set  W = 0,  S = { j eS (x) | s^ X)} , y = ue,  t = 1,  and  go  to  1. 

Step  1.  Let 

v = min  {max  s.,  min  {s  |s.  > z - y}}, 
jeS  J jeS  J J U 

J = fJes  Is,  = v },  Q = {jeN|s  > v },  M = U M.. 

J J J jeJ  J 

Choose  i(t)  such  that 


|NW  .\QLW|  = min  |N  \QUW| 
( ’ ieT(x)(TiJ  1 


and  let  {j  (t)  } = 


Then  set  W - WU(N^  ^\Q) , y •-  y + s^(t). 


If  y > *,,,  go  to  2. 


Otherwise  set  S — S\{j(t)}, 


SJ  - { 


Sj  ' Sj(t) 


JeQPNi(t) 
otherwise , 


t — t + 1,  and  go  to  1. 

Step  2.  Add  to  (SC)  the  inequality 


E x > 1. 
jeW  J 

This  procedure  terminates  after  a number  of  iterations  equal  to  the 
number  of  jeS(x)  such  that  s^  > 0,  with  an  inequality  satisfied  by  every 


m 
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cover  better  than  the  one  associated  with  z^,  and  violated  by  the  last 
cover  x.  Typically,  the  cuts  tend  to  become  successively  stronger  during 
the  procedure,  the  last  few  cuts  often  having  just  one  or  two  l's.  The 
total  number  of  cuts  required  to  solve  a m x n problem  increases  with  both 
m and  n.  For  the  randomly  generated  sparse  problems  solved  in  our  experiment 
the  number  of  cuts  needed  was  typically  less  than  3m  or  n/3,  as  can  be 
seen  from  the  results  of  section  8.  This  of  course  refers  to  the  number 
of  cuts  required  when  the  cuts  are  used  within  the  framework  of  our 
algorithm  in  the  class  discussed  here,  which  also  uses  implicit  enumeration. 
The  cuts  by  themselves,  without  branching  (but  with  periodic  use  of  SGRAD) 
were  able  to  solve  all  of  the  20  test  problems  from  the  literature  that  we 
could  obtain,  and  all  but  one  of  the  10  test  problems  with  m = 100  and 
100  < n < 1000  that  we  generated,  as  shown  in  section  8.  As  to  the  larger 
problems,  six  of  the  ten  200  x 1000  problems  and  four  of  the  ten  200  x 2000 
problems  were  solved  by  cutting  planes  only,  without  branching  (see  section  8 
for  details). 

7 . Branching  and  Node  Selection. 

As  mentioned  in  section  2,  we  branch  whenever  the  gap  z^  - z^ 

decreases  by  less  than  e = 0.5  during  a sequence  of  4a-  iterations,  where  a 

is  the  frequency  of  applying  SGRAD  (every  a-th  iteration).  The  actual 

rule  we  use  is  slightly  more  sophisticated:  if  for  3a  iterations  the 

decrease  in  z - z is  less  than  e = 0.5,  then  we  branch  at  the  first 
U L 

iteration  where  the  current  (say  the  k-th)  dual  solution  u gives  a bound 
greater  than  a weighted  sum  of  the  earlier  bounds,  namely  where 
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If  this  requirement  is  not  met  for  any  of  the  next  a iterations, 
and  the  decrease  in  the  gap  is  still  less  than  e = 0.5,  then  (i.e.,  after 
a total  of  4(*  iterations)  we  branch. 

We  use  two  branching  rules  intermittently.  The  first  one  is 
based  on  a disjunction  (3),  which  is  strenghened  to  (7)  when  used  for 
branching,  so  as  to  partition  the  feasible  set: 

P 

(7)  V (x  =0,  jcQ  ; I x.  > 1,  k=l, . . . ,i-l) 

i-1  J j«Qk  J 

The  sets  for  the  disjunction  (7)  are  constructed  by  a procedure 
similar  to  Step  1 of  the  cut  generating  routine.  The  use  of  the  same 
criterion  (of  minimizing  N^^XQ^)  as  *-n  c^e  cut  generating  routine  is 
motivated  by  the  attempt  to  guarantee  that  each  subproblem  created  by  the 
branching  will  have  at  least  one  inequality  with  as  few  l's  as  possible. 

The  second  branching  rule  is  a variation  of  the  one  proposed  by 
J.  Etcheberry  [4].  We  choose  two  row  indices  i,keM,  such  that  i is  the 
last  element  in  the  ordered  set  M,  and  k / i,  with 


|N  flN  | = min  | N ON 

N.HN^O  n 1 
h i 


and  then  branch  on  the  disjunction 


(8) 


(x  -0,  jeN.ON  ) V ( Z x > 1). 
J JeNinNk  J 


Whenever  |N  (IN | = 1,  which  is  usually  the  case,  the  second  term 

1.  K 

becomes  x =1,  where  fj}-N.  PN,  , and  (8)  becomes  a special  case  of  the  usual 
j J i k 

branching  dichotomy  (Xj=0)v(Xj=1) . However,  a comparison  of  the  rule  based  on 
(8)  with  the  usual  branching  dichotomy  combined  with  a choice  rule  for  the 
branching  variable  different  from  the  one  used  here,  has  shown  (8)  to 
be  on  the  average  considerably  better. 


i 
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The  choice  between  rule  1 (using  the  disjunction  (7))  and  rule  2 
(using  the  disjunction  (8))  is  dictated  by  the  following  considerations. 

Since  none  of  the  two  rules  dominates  the  other,  i.e.,  at  certain  nodes  it 
may  be  more  advantageous  to  use  rule  1,  whereas  at  other  nodes  (of  the 
same  search  tree)  rule  2 may  be  preferable,  we  introduce  a measure  of 
relative  efficiency  of  rule  1 (as  compared  to  the  usual  dichotomy),  and  then 
choose  rule  1 whenever  it  meets  the  efficiency  requirements.  With  the 

traditional  dichotomy,  the  k-th  level  of  the  (binary)  search  tree  contains 

Ic  Ic 

2 nodes,  with  k variables  fixed  at  each  node,  i.e.,  a total  of  k2  variables 

\£ 

fixed.  In  other  words,  in  order  to  fix  f = k2  variables  by  generating  a 

k 

breadth-first  search  tree,  one  has  to  break  up  the  feasible  set  into  p=2 
subsets.  Substituting  for  k in  the  expression  for  f,  we  find  that  with  the 
usual  dichotomy,  breaking  up  the  feasible  set  into  p subsets  (all  on  the 
same  level  of  the  search  tree)  makes  it  possible  to  fix  a total  of  f = plog2p 
variables.  Therefore,  in  order  to  use  branching  rule  1,  i.e.,  the 
disjunction  (7),  which  breaks  up  the  feasible  set  into  p subsets,  we 
require  that 
P 

(i)  1 1 Q 1 1 > <Pplog2p, 

i.e.,  that  the  number  of  variables  fixed  on  all  branches  be  greater  than 
V times  plog2p.  As  to  the  value  of  the  parameter  9,  after  some  experimentation 
we  found  that  9 < 1 implies  that  disjunction  (7)  will  be  preferred  to  (8) 
about  2/3  of  the  time,  while  cp  > 3 implies  the  opposite.  A judicious 
mix  of  the  two  rules  requires  1 < V < 3.  The  current  version  of  the 
algorithm  uses  9 = 1. 

Besides  condition  (i) , we  have  also  found  it  useful  to  require  that  (ii) 
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there  be  at  most  one  singleton  among  the  sets  Q^,  1*1,..., p,  and  that  (iii)  p 
not  exceed  a specified  constant,  which  we  usually  set  to  8.  Whenever 
conditions  (i) , (ii)  and  (iii)  are  met,  we  use  disjunction  (7);  otherwise 
we  use  disjunction  (8). 

Our  node  selection  rule  is  LIFO;  i.e.,  whenever  available,  we  choose 

one  of  the  nodes  created  by  the  last  branching.  When  rule  1 is  used,  we 

choose  the  p nodes  created  by  disjunction  (7)  in  order  of  decreasing  |Q^|; 

and  when  rule  2 is  used,  we  choose  first  the  node  defined  by  x =0,  jeN  flN.  , 

j l k 

then  the  other.  When  a node  is  fathomed,  we  first  look  for  an  unfathomed 
brother  node,  and  if  none  is  available,  we  go  to  the  father  node. 

Table  5 contains  information  concerning  our  branching  rules.  The 
problems  listed  are  all  those  among  the  200  X 1000  and  200  x 2000,  27.  density 
problems,  (i.e.,  among  the  problems  in  sets  4 and  5),  whose  solution 
required  branching.  The  criterion  for  choosing  the  branching  rule  was 
the  one  described  above,  with  cp  = 1.  For  each  problem,  the  table  gives 
the  number  of  branchings  according  to  rule  1 and  rule  2.  Further,  each 
branching  according  to  rule  1 is  described  by  a sequence  of  numbers  in 
parentheses;  where  the  length  of  the  sequence  is  the  number  of  branches 
created,  and  each  number  in  the  sequence  is  the  number  of  variables  fixed 
at  0 on  the  corresponding  branch.  Thus,  for  solving  problem  4.4  (with 
m » 200,  n = 1000,  density  27.),  the  code  branched  3 times,  using  each  time 
rule  1 (disjunction  (7)).  The  first  branching  created  5 new  nodes 
(subproblems),  with  58  variables  fixed  at  0 on  the  first  branch,  53  on  the 
second,  44  on  the  third,  etc. 

A typical  search  tree  is  illustrated  in  Fig.  2:  it  is  the  one 
corresponding  to  problem  5.1.  The  symbol  ^^means  a branching  based  on 
rule  2.  The  numbers  on  the  arcs  stand  for  the  variables  fixed  by  branching 
rule  1.  At  node  0,  the  algorithm  generates  30  cuts,  then  branches  according 
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Table  5.  Information  on  the  use  of  the  two  branching  rules 


! Problem  data 

L 

No.  of  branchings 
according  to 

No.  of  variables  fixed  on  each  branch, 
for  every  branching  according  to 

No. 

m 

n 

Rule  1 

Rule  2 

Rule  1 

3.4 

200 

1000 

3 

0 

(58,53,44,15,6),  (15,7,5),  (15,6,5,4) 

3.6 

200 

1000 

6 

i 

(20,10,8,7,6,1),  (24,12,10,9,5,3,1), 

(7, 5, 2, 2),  (14,4,2,1),  (16,14,11,4,4),  (6,1) 

3.8 

200 

1000 

2 

2 

(26,13,7,5,2,2),  (8,7,1) 

3.9 

200 

1000 

6 

5 

(40.21.10.9.8.4.2) ,  (18,15,10,3,2,1), 

(14.13.13.13.2) ,  (15,4,2),  (25,1),  (10,3,1) 

4.1 

200 

2000 

5 

3 

(42,24,1),  (15,14,14,9,3,3,2,1), 
(32,24,19,9,2),  (8,6,2),  (16,8,3,3) 

4.4 

200 

2000 

9 

3 

(35,33,33,27,6),  (50,26,3,2),  (40,6,5,2,1), 

(16.10.5.4.1) ,  (20,16,10,5,1),  (41,12,9,5,3) 

(7. 7. 6. 2.1) ,  (21,4,2),  (30,16,5,3,2) 

4.7 

200 

2000 

2 

4 

(5,1),  (36,11,5,5) 

4-8 

200 

2000 

4 

2 

(35,9,3,3,2,1),  (23,8,8,3,2),  (9, 4, 4, 2) 
(20,13,9,5,3,3,3,2) 

|4.9 

1 

200 

2000 

1 

0 

(44,20,17,10,3,2) 

to  rule  2,  creating  2 new  nodes.  Prior  to  branching,  the  upper  and  lower 
bounds  are  2^*256  and  z^=250.58  respectively,  and  there  are  204  variables, 
i.e.,  1796  variables  have  been  fixed  at  0.  Next  the  algorithm  selects  node  2, 
generates  another  30  cuts,  improves  the  lower  bound  to  z^*250.83,  and  fixes 
another  9 variables  before  branching,  leaving  195.  This  time  the  branching 
is  according  to  rule  1,  and  3 new  nodes  are  created,  with  1,  24  and  42  new 
variables  fixed  at  nodes  3,  4 and  5 respectively.  Node  5 is  chosen  next,  etc. 


8.  Computational  Experience  with 
the  Algorithm  as  a Whole. 

We  have  tested  the  algorithm  as  a whole  on  6 sets  of  problems,  that 
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we  now  describe.  The  problems  are  labelled  with  two  numbers  separated 
by  a dot.  The  first  number  is  the  set  to  which  the  problem  belongs,  the 
second  one  distinguishes  the  problems  within  the  same  set.  Thus  2.3  is  the 
third  problem  in  set  2, 

Sets  1 and  2,  containing  12  and  8 problems  respectively,  are  from 
Salkin  and  Koncal  [9],  who  account  for  their  origin  as  follows.  Problem  1.1 
was  obtained  from  C.  E.  Lemke,  problem  1.8  from  IBM  Buffalo,  and  the 
remaining  problems  in  set  1 from  A.  M.  Geoffrion.  All  the  problems  in 
set  1,  except  for  1.1,  have  unit  costs.  They  all  have  randomly  generated 
coefficient  matrices  with  77,  density. 

Problem  2.1  is  attributed  by  Salkin  and  Koncal  [9]  to  American  Airlines, 
problems  2.6  and  2.7  to  IBM  New  York,  while  the  remaining  problems  in  set  2 
were  randomly  generated  by  H.  M.  Salkin.  The  problems  in  set  2 have  coefficient 
matrices  whose  density  varies  between  2%  and  117,,  and  randomly  generated  costs 
in  the  range  [0,99] . 

Sets  3,  4,  5 contain  10  problems  each,  randomly  generated  by  the  second 
author,  with  coefficient  matrices  of  27,  density,  subject  to  the  requirement 
that  every  column  has  at  least  one,  and  every  row  at  least  two,  nonzero 
entries.  The  costs  are  randomly  generated  from  the  range  [1,100] . 

Finally,  set  6 contains  5 problems,  also  randomly  generated  by  the 
second  author  subject  to  the  same  conditions,  with  costs  from  the  same  range, 
but  with  coefficient  matrices  of  57,  density. 

Table  6 compares  the  performance  of  our  algorithm  with  two  other 
procedures,  that  of  Salkin  and  Koncal  [9],  and  of  Lemke,  Salkin  and  Spielberg 
[8],  on  the  20  Salkin-Koncal  problems  (sets  1 and  2).  The  procedure  used  by 
Salkin  and  Koncal  is  Gomory's  all  integer  cutting  plane  algorithm,  while  the 


» -« i 


| 

J 


I 


I 


Table  6.  Comparison  of  algorithms  on  20  problems  from  the  literature 


Salkin- 
Koncal  [9] 
Time* ** *** 


Lemke- 
Salkin- 
Spielberg  [8] 
Time** 


Algorithm  of  section  2 
(without  branching) 


0.78 

16.33 

2.47 


Problem  data 


No.  m 


15 
30 
30 
30 
30 
30 
30 
30 

30  80 

30  80 

30  90 

1.12  30  90 

| 2.1  104  133 

j 2.2  200  413 

!2.3  200 

2.4  200 

j 2.5++  50 
j 2 . 6 36 

•2.7  46 

1 2.8  50 


* UNIVAC  1108  seconds  (about  the  same  speed  as  DEC  20/50) 

**  IBM  360/50  seconds  (4-5  times  slower  than  DEC  20/50) 

***  DEC  20/50  seconds 

+ Average  time  for  the  two  problems  of  the  same  size 
++  Time  limit  exceeded 


29 

2.00 

4- 

0 

20.4 

0 

+ 

0 

31.2 

6 

4. 

0 

30.6 

72 

6.60 

424.0 

22 

8.03 

625.9 

6 

12.00 

461.3 

0 

6.10 

803.5 

0 1 

6.23 

144.5 

0 

2.06 

35.5 

o 

1.76 

56.9 

10 

5.94 

670.0 

0 

5.12 

0.46 

0.46 

0.63 

1.01 

0.36 

0.96 


16  0.79 
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algorithm  of  Lemke,  Salkin  and  Spielberg  is  a specialized  implicit 
enumeration  procedure,  with  an  imbedded  linear  program.  Our  algorithm 
solved  each  of  these  problems  without  branching,  and  on  the  larger  problems 
of  set  2 its  performance  was  an  order  of  magnitude  better  than  that  of  the 
other  two  procedures.  Note  that  about  1/2  of  the  total  time  (in  some 
cases  more,  in  others  less  than  1/2)  was  spent  on  SGRAD.  The  number  of 
times  the  subgradient  procedure  was  used  can  be  calculated  by  dividing  the 
number  of  cuts  by  10,  and  adding  1 to  the  result . The  rest  of  the  time 
was  spent  on  primal  and  dual  heuristics  and  cut  generation. 

Note  also  that  7 of  the  12  problems  in  set  1,  and  5 cf  the  8 problems 
in  set  2,  did  not  require  any  cuts.  This  does  not  necessarily  imply  that 
the  linear  programming,  relaxation  of  (SC)  had  an  integer  solution  in  these 
cases,  but  it  does  imply  that  the  gap  between  the  linear  programming 
optimum  and  the  integer  optimum  was  less  that  1.  This  small  gap  apparently 
did  not  make  most  of  these  problems  easy  for  the  other  two  procedures,  as 
evidenced  by  their  performance  on  problems  1.11,  2.3,  2.4,  2.5,  2.7  and  2.8. 
Our  procedure,  however,  can  take  advantage  of  the  small  gap  due  to  the  use 
of  the  primal  heuristics. 

Table  7 shows  the  performance  of  our  algorithm,  still  without 
branching,  on  the  10  randomly  generated  problems  of  set  3.  Note  that 
6 of  these  problems  did  not  require  cuts.  As  to  the  remaining  4 problems, 
one  of  them  (3.5)  required  only  4 cuts,  while  the  other  three  required 
large  numbers  of  cuts  and  one  of  them  (3.8)  was  actually  not  solved  within 
the  time  limit  of  5 minutes.  Had  we  used  the  full  algorithm  (with  branching) 
on  these  3 problems,  the  number  of  cuts  and  the  time  needed  would  in  all 
likelihood  have  been  smaller.  However,  we  ran  the  full  version  of  the 
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Table  7.  Algorithm  of  section  2 without  branching 


Problem  data 

No.  of 

Time* 

No. 

m 

n 

cuts 

Total 

SGRAD 

i 

3.1 

100 

100 

0 

1.21 

0.57  : 

3.2 

100 

200 

0 

1.26 

0.48 

3.3 

100 

300 

0 

3.04 

1.85 

3.4 

100 

400 

0 

3.22 

2.13 

3.5 

100 

500 

4 

3.95 

1.51 

3.6 

100 

600 

146 

42.61 

19.03 

3.7 

100 

700 

59 

24.03 

14.38 

** 

3.8 

100 

800 

682 

>300.00 

65.06 

3.9 

100 

900 

0 

13.27 

11.18 

3.10 

100 

1000 

0 

8.30 

6.00 

* DEC  20/50  seconds 
**  Time  limit  exceeded 


algorithm  only  on  problem  3.8,  with  the  outcome  that  the  problem  was  solved 

in  92.24  seconds,  with  a search  tree  of  30  nodes  and  a total  of  362  cuts. 

Table  8 describes  the  performance  of  our  algorithm  (in  Its  complete  version) 

on  the  20  randomly  generated  test  problems  of  sets  4 and  5.  It  shows  the  value 

e of  the  optimum;  the  upper  and  lower  bounds,  as  well  as  the  number  of 
opt 

variables  left,  before  the  algorithm  first  branched;  the  number  of  nodes  and 
cuts,  and,  finally,  the  total  time  and  the  time  spent  on  subgradient  optimization. 
Six  out  of  the  10  problems  In  set  4,  and  4 out  of  the  10  problems  In  set  5,  did 
not  require  any  branching.  Of  those  problems  that  did  not  require  branching, 

4 in  set  4 and  2 in  set  5 did  not  require  cuts  either.  These  6 problems  had  a 
gap  of  less  than  1 between  the  linear  programing  optimum  and  the  Integer 
optimum,  and  for  some  of  them  the  linear  programing  optimum  may  actually  be 
integer  (since  we  don't  use  the  simplex  method,  we  do  not  necessarily  discover 
this  when  we  solve  a problem). 


' 


167.15 

110.52 

416.46 

215.41 

5.1 

253 

256 

250.58 

204 

30 

473 

5.2* ** 

307+ 

315 

299.32 

408 

51++ 

625 

5.3 

226 

. 

226 

226 

0 

1 

0 

5.4 

242 

247 

240.29 

258 

49 

765 

5.5 

211 

211 

211 

0 

1 

15 

5.6 

213 

213 

213 

0 

1 

10 

5.7  1 293  i 296  291.02 


288  ' 286.09 


5.9  279  ! 281  276.21 


265  265 


38.73  2 

32. 

248.65  152.62 

241.42  108.72 

140.61  94.60 

25.89  i 15.38 


* DEC  20/50  seconds 

**  Time  limit  exceeded 

+ Best  solution  found  before  exceeding  time  limit 
-m-  51  nodes  generated,  of  which  30  fathomed 


I 


-34- 


1 

As  to  the  remaining  problems  in  sets  4 and  5,  they  were  solved  with 
a reasonable  computational  effort,  in  terms  of  the  number  of  nodes  in  the 
search  tree  (never  more  than  50),  the  number  of  cutting  planes  (several 
hundred  at  most),  as  well  as  in  terms  of  computing  time  (between  about  20 
seconds  and  7 minutes),  except  for  problem  5.2,  which  could  not  be  solved 
within  the  time  limit  of  10  minutes.  The  best  solution  found  for  this 
problem,  with  a value  of  307,  is  at  most  2.337.  worse  than  the  optimum, 
since  <299. 32>  is  the  lower  bound  found  before  the  first  branching  occurred. 

From  Table  3 one  can  see  again  that  the  subgradient  procedure  in 
most  cases  takes  up  between  1/2  and  2/3  of  the  computational  effort.  The 
time  needed  to  solve  a problem  strongly  depends  on  the  number  of  variables 
left  before  one  has  to  branch:  there  is  a high  positive  correlation 
between  this  number,  and  the  number  of  nodes  in  the  search  tree.  There  is 
an  even  higher  correlation,  of  course,  between  the  number  of  nodes  in  the 
search  tree  and  the  total  time  needed  to  solve  the  problem.  On  the  other 
hand,  cuts  are  cheap  to  generate,  and  the  number  of  cuts  affects  the  total 
time  mainly  through  the  fact  that  after  every  ot  cuts  the  subgradient  procedure 
is  applied  (which  in  turn  is  costly).  This  can  be  seen,  for  instance,  by 
looking  at  the  4 problems  in  set  5 that  required  no  branching.  Problems  5.3 
and  5.10,  which  required  no  cuts  either,  took  26-27  seconds  to  be  solved. 
Problems  5.6  and  5.5,  which  required  10  and  15  cuts  respectively,  required 
only  33  and  39  seconds  respectively,  i.e.,  about  1.24-1.45  times  the  time 
required  for  problems  5.3  and  5.10.  The  reason  for  this  is  that  the 
subgradient  procedure  was  applied  once  to  each  of  problems  5.3  and  5.10, 
and  twice  to  each  of  problems  5.6  and  5.5  The  reason  the  computational 
effort  increased  less  than  twice  for  the  second  pair  of  problems,  is  that 
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SGRAD  is  Che  most  time  consuming  when  applied  for  the  first  time  to  a 
problem,  as  discussed  in  section  5. 

As  can  be  seen  from  the  above  computational  experience,  the  algorithm 
discussed  here  is  a reasonably  reliable,  efficient  tool  for  solving  large, 
sparse  set  covering  problems,  as  well  as  for  finding  good  approximate  solutions 
to  problems  that  are  too  hard  to  be  solved  exactly.  However,  the  strength 
of  the  family  of  cuts  from  conditional  bounds  strongly  depends  on  sparsity. 

As  problem  density  increases,  the  strength  of  the  cuts  diminishes,  and  so 
does  the  efficier -y  of  this  algorithm,  at  least  in  its  versions  that  we  have 
tested.  For  relatively  small  problem  sizes,  the  algorithm  can  cope  well 
with  somewhat  higher  density,  as  illustrated  by  problem  sets  1 (77  density) 
and  2 (2%  - 117  density),  on  which  it  clearly  outperformed  the  other  two 
procedures  that  had  been  tried  on  those  problems.  But  for  larger  problems, 
this  is  unlikely  to  be  the  case.  To  see  how  fast  the  algorithm's 
performance  declines  with  problem  density,  we  have  run  the  code  on  a set 
of  5 randomly  generated  200  x 1000  problems  with  57  density  (problem  set  6). 

The  results  are  shown  in  Table  9. 


Table  9.  Complete  algorithm  on  57  density  nroblems 
(m  = 200,  n = 1000) 


No. 

Best 

zu 

“St 

No.  of  nodes 
in  search  tree 

No.  of 

cuts 

Time* 

zu 

ZL 

No.  of 
variables 
left 

Gene- 

rated 

Fathomed 

Total 

SGRAD 

6 . 1+ 

i 

141 

143 

132.07 

189 

143 

116 

2160 

>1800 

842 

6.2+ 

149 

157 

139.83 

244 

173 

153 

2412 

>1800 

920 

6.3 

145 

145 

139.60 

163 

163 

2481 

1548 

787 

6.4 

131 

135 

128.05 

21 

21 

275 

194 

106  1 

6.5+ 

161 

173 

152.52 

161 

140 

2425 

>1800 

884 

* DEC  20/50  seconds 


+ 


Time  limit  exceeded 
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Two  of  the  5 problems  were  solved  within  the  30  minutes  time  limit. 

By  no  means  accidentally,  these  two  happen  to  be  the  problems  with  the  smallest 
numbers  of  variables  left  before  branching.  For  the  remaining  3 problems, 
the  best  solution  found  is  guaranteed  to  be  within  6%,  6.4%  and  5.2% 
respectively,  of  the  optimum,  though  it  could  of  course  be  much  closer. 

Though  the  algorithm  exceeded  the  time  limit  before  finishing  3 of  the 
five  problems,  from  the  data  of  Table  6 (ratio  of  fathomed  versus  unfathomed 
nodes,  ratio  between  numbers  of  variables  left  before  branching  for  the 
unsolved  and  the  solved  problems),  it  seems  that  all  the  problems  could 
be  solved  by  a not  too  drastic  extension  of  the  time  limit.  In  general, 
the  data  of  Tables  7,  8 and  9 show  a certain  reliability  of  the  procedure 
on  randomly  generated  problems:  the  results  are  not  wildly  erratic,  as  it 
so  often  happens  with  integer  programming  algorithms. 

It  is  possible  that  a different  version  of  our  approach,  that  would 
generate  a larger  number  of  cuts  but  retain  only  the  stronger  ones,  and 
rely  more  heavily  on  branching  from  a disjunction  (7),  would  be  more 
successful  on  higher  density  problems.  It  is  also  possible,  even  highly 
probable,  that  a different  backtracking  rule,  that  would  lead  to  earlier 
processing  of  the  nodes  with  the  best  lower  bounds,  would  provide  a con- 
siderably better  bound  on  the  quality  of  the  solution  obtained  when  the 
procedure  stops  prematurely  because  of  the  time  limit.  These  ideas,  however, 
have  not  yet  been  tested. 
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Ingredients  was  Implemented  and  tested  In  several  versions.  The  variant 
of  the  algorithm  that  emerged  as  best  was  run  on  55  randomly  generated 
test  problems  (20  of  them  from  the  literature'),  with  up  to  200  constraints 
and  2000  variables.  The  results  show  the  algorithm  to  be  more  reliable 
and  efficient  than  earlier  procedures  on  large,  sparse  set  covering  problems. 


